Abstract: Social Networking sites provide tremendous impetus for Big Data in mining people’s opinion. Public API’s catered by sites such as Twitter provides us with useful data for perusing writer’s attitude with reference to a particular topic, product etc. To discern people’s opinion, tweets are tagged into positive, negative or neutral indicators. This project provides an effective mechanism to perform opinion mining by designing a end to end pipeline with the help of Apache Flume, Apache HDFS, Apache Oozie and Apache Hive. To make this process near real time we study the workaround of ignoring Flume tmp files and removing default wait condition from Oozie job configuration. The underlying architecture employed here is not restricted only to opinion mining but also has a gamut of applications. This paper explores few of the use cases that can be developed into actual working models.
Keywords: Opinion mining, Big data, Hadoop.